Keyword [Location-Controllable]

Reed S E, Akata Z, Mohan S, et al. Learning what and where to draw[C]//Advances in Neural Information Processing Systems. 2016: 217-225.

1. Overview

1.1. Motivation

existing methods synthesize images based on global constraints (class label and caption), do not provide control over pose or object location

In this paper, it proposed Generative Adversarial What-WHere Network (GAWWN)

synthesize images given instructions describing that content to draw in which location
- condition on coarse location. Implemented using STN
- condition on part location. set of normalized (x, y) coordinates

1.2. Contribution

novel architecture for text- and location-controllable image synthesis
text-conditional object part completion model enabling a streamlined user interface for part locations

CNN (deterministic)
VAE, convolutional VAE, recurrent VAE (probabilistic)
GAN
STN

1.4. Future Work

learn the obj and part location in an unsupervised or weakly supervised way

2. GAWWN

2.1. Bounding-Box-Conditional Text-to-Image Model

replicate text embedding spatially to form a MxMxT feature map
warp spatially to fit into the normalized bounding box coordinates (outside the box are all zeros)

2.2. Keypoint-Conditional Text-to-Image Model

location key points are encoded into a MxMxK spatial feature map (channels correspond to the part)
max replicate depth

2.3. Conditional Keypoint Generation Model

Not optimal to require users to enter every single keypoint of the parts of object they wish to be drawn. it would be useful to have access to all of the conditional distributions of unobserved ketpoints given a subset of observed keypoints and the text description.

3. Experiments

3.1. Details

text embedding. char-CNN-GRU
0.0002 Adam, batch size 16

(NIPS 2016) Learning what and where to draw

1. Overview

1.1. Motivation

1.2. Contribution

1.4. Future Work

2. GAWWN

2.1. Bounding-Box-Conditional Text-to-Image Model

2.2. Keypoint-Conditional Text-to-Image Model

2.3. Conditional Keypoint Generation Model

3. Experiments

3.1. Details

3.2. With Bounding Box

3.3. Via Keypoints

3.4. Comparison

1. Overview

1.1. Motivation

1.2. Contribution

1.3. Related Work

1.4. Future Work

2. GAWWN

2.1. Bounding-Box-Conditional Text-to-Image Model

2.2. Keypoint-Conditional Text-to-Image Model

2.3. Conditional Keypoint Generation Model

3. Experiments

3.1. Details

3.2. With Bounding Box

3.3. Via Keypoints

3.4. Comparison